Efficient posterior profiles¶
Marco Raveri (marco.raveri@unige.it), Cyrille Doux (doux@lpsc.in2p3.fr), Shivam Pandey (shivampcosmo@gmail.com)
In this notebook we show how to obtain posterior profiles from synthetic probability models, as in Raveri, Doux and Pandey (2024), arXiv:XXXX.XXXX.
If you want more details on how to build normalizing flow based synthetic models for posterior distributions check out the corresponding example notebook.
Notebook setup:¶
We start by importing everything and setting up a controlled example:
# Show plots inline, and load main getdist plot module and samples class
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2
# import libraries:
import sys, os
sys.path.insert(0,os.path.realpath(os.path.join(os.getcwd(),'../..')))
from getdist import plots, MCSamples
from getdist.gaussian_mixtures import GaussianND
import getdist
getdist.chains.print_load_details = False
import scipy
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# tensorflow imports:
import tensorflow as tf
import tensorflow_probability as tfp
# import the tensiometer tools that we need:
import tensiometer
from tensiometer import utilities
from tensiometer.synthetic_probability import synthetic_probability as synprob
# getdist settings to ensure consistency of plots:
getdist_settings = {'ignore_rows': 0.0,
'smooth_scale_2D': 0.3,
'smooth_scale_1D': 0.3,
}
2024-09-14 12:14:49.118508: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
We build here a random Gaussian mixture model that we are going to use for tests.
The esample is seeded so that it is reproducible. If you want a different example change the value of the seed.
You can also change dimensionality and number of modes.
# define the parameters of the problem:
dim = 6
num_gaussians = 3
num_samples = 10000
# we seed the random number generator to get reproducible results:
seed = 100
np.random.seed(seed)
# we define the range for the means and covariances:
mean_range = (-0.5, 0.5)
cov_scale = 0.4**2
# generate means and covs:
means = np.random.uniform(mean_range[0], mean_range[1], num_gaussians*dim).reshape(num_gaussians, dim)
weights = np.random.rand(num_gaussians)
weights = weights / np.sum(weights)
covs = [cov_scale*utilities.vector_to_PDM(np.random.rand(int(dim*(dim+1)/2))) for _ in range(num_gaussians)]
# cast to required precision:
means = means.astype(np.float32)
weights = weights.astype(np.float32)
covs = [cov.astype(np.float32) for cov in covs]
# initialize distribution:
distribution = tfp.distributions.Mixture(
cat=tfp.distributions.Categorical(probs=weights),
components=[
tfp.distributions.MultivariateNormalTriL(loc=_m, scale_tril=tf.linalg.cholesky(_c))
for _m, _c in zip(means, covs)
],
name='Mixture'
)
# sample the distribution:
samples = distribution.sample(num_samples).numpy()
# calculate log posteriors:
logP = distribution.log_prob(samples).numpy()
# create MCSamples from the samples:
chain = MCSamples(samples=samples,
settings=getdist_settings,
loglikes=-logP,
name_tag='Mixture',
)
# we want to find the maximum posterior point:
_temp_maxima, _temp_max_value = [], []
for _m in means:
# maximize the likelihood starting from all the means:
_max = scipy.optimize.minimize(lambda x: -distribution.log_prob(x).numpy(), _m, method='Nelder-Mead')
# this usually converges to the nearest mode:
_temp_maxima.append(_max.x)
_temp_max_value.append(-_max.fun)
maximum = _temp_maxima[np.argmax(_temp_max_value)]
maximum_value = _temp_max_value[np.argmax(_temp_max_value)]
# we make a sanity check plot:
g = plots.get_subplot_plotter()
g.triangle_plot(chain, filled=True, markers=maximum)
As we can see this example beautifully showcases projection effects, especially in $p_5$. As you might have seen in the synthetic probability notebook the low peak in the 1D marginal of $p_5$ is the actual full-D peak.
Flow training:¶
We now train a synthetic probability model. Note that we need a flow with good local accuracy since we are going to maximize its value.
If you are interested in how to build and control these types of flows, check out the synthetic probability tutorial.
kwargs = {
'feedback': 1,
'verbose': -1,
'plot_every': 1000,
'pop_size': 1,
'num_flows': 5,
'epochs': 300,
}
flow = synprob.average_flow_from_chain(chain, # parameter difference chain
**kwargs)
Warning: MPI is incompatible with no cache. Disabling MPI. Training flow 0
0epoch [00:00, ?epoch/s]
Training flow 1
0epoch [00:00, ?epoch/s]
Training flow 2
0epoch [00:00, ?epoch/s]
Training flow 3
0epoch [00:00, ?epoch/s]
Training flow 4
0epoch [00:00, ?epoch/s]
# we need to check metrics to make sure everything looks good:
flow.print_training_summary()
Number of flows: 5 Flow weights : [0.22 0.2 0.2 0.2 0.19] loss : [7.27 7.28 7.34 7.33 7.34] val_loss : [7.4 7.45 7.49 7.49 7.54] lr : [0. 0. 0. 0. 0.] chi2Z_ks : [0.04 0.04 0.05 0.04 0.06] chi2Z_ks_p : [0.12 0.06 0.03 0.06 0. ] loss_rate : [-0.01 -0. -0. 0. -0. ] val_loss_rate: [-0.02 0. -0. -0. -0. ]
# then we need to check the estimate of the local accuracy:
ev, eer = flow.evidence()
print(f'log(Z) = {ev} +- {eer}')
log(Z) = 0.0792924091219902 +- 0.5120421051979065
It is usually good to check that the evidence error (if available) is under control, especially when we want to obtain posterior profiles.
The evidence error is the variance of log likelihood values on samples from the flow and gives us a handle on the local accuracy of the flow.
Cathastrofic initialization of weights happens and if this value is too high then it might be worth re-running flow training.
If you are training on marginals (without likelihood values available) then it might be a good idea to add population selection as a layer of protection against bad initialization.
If you find that the results are not stable (especially in the bulk of the posterior) check out the notebook on synthetic probability modelling and the section discussing high accuracy flows.
Posterior profile:¶
We now want to calculate posterior profiles for our distribution. These are obtained maximizing over all parameters but the ones that are been considered.
Having efficient flow models from which we can sample and calculate probability values means that we can afford lots of maximizations, so that we can calculate up to 2D profiles.
Note that a flow can be trained on marginal distributions, allowing us to combine profiling and marginalization.
Posterior profiles can be easily obtained using the appropriate tensiometer class operating on the flow:
from tensiometer.synthetic_probability import flow_profiler
# define the options for the profiler:
profiler_options = {
'num_gd_interactions_1D': 100, # number of gradient descent interactions for the 1D profile
'num_gd_interactions_2D': 100, # number of gradient descent interactions for the 2D profile
'scipy_options': { # options for the scipy polishing minimizer
'ftol': 1.e-06,
'gtol': 0.0,
'maxls': 40,
},
'scipy_use_jac': True, # use the jacobian in the minimizer
'num_points_1D': 64, # number of points for the 1D profile
'num_points_2D': 32, # number of points per dimension for the 2D profile
'smooth_scale_1D': 0.2, # smoothing scale for the 1D profile
'smooth_scale_2D': 0.2, # smoothing scale for the 2D profile
}
# initialize the profiler:
flow_profile = flow_profiler.posterior_profile_plotter(flow,
initialize_cache=False, # usually we want to avoid initializing the cache for all the parameters
feedback=2, # we want high feedback to see the progress
)
Flow profiler * use_scipy = True * pre_polish = True * polish = True * smoothing = True * box_prior = False * parameters = ['param1', 'param2', 'param3', 'param4', 'param5', 'param6'] * feedback = 2
We now have initialized the profiler. If cache initialization is enabled the code will calculate 1D and 2D profiles for all parameter combinations. This is usually a lot of work so we keep it separate in this tutorial and proceed with the profile calculation for just a few parameters:
# now we initialize the cache for the parameters we want to profile:
profile_params = ['param3', 'param5']
flow_profile.update_cache(params=profile_params,
**profiler_options)
* initializing profiler data.
* generating samples for profiler
- number of random search samples = 20000
- sampling the distribution
- time taken to sample the distribution 18.96 (s)
* finding MAP
* finding global best fit
- doing initial randomized search
- time taken for random initial selection 0.001234 (s)
- doing minimization (scipy)
- time taken for minimization 11.58 (s)
* initializing 1D profiles
1D profiles: 0%| | 0/2 [00:00<?, ?it/s]
* calculating the 1D profile for: param3 with index 2
- doing initial randomized search
- time taken for random algorithm 0.6413 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 2 / 62
- doing gradient descent pre-polishing
- time taken for gradient descents 34 (s)
at the end of descent after 100 steps 12 samples were still beeing optimized.
gradient descents did not improve 26 points
- doing minimization polishing (scipy)
- time taken for polishing 27.64 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.000479 (s)
- smoothing results
- time taken for smoothing 0.0008142 (s)
1D profiles: 50%|█████ | 1/2 [01:02<01:02, 62.28s/it]
* calculating the 1D profile for: param5 with index 4
- doing initial randomized search
- time taken for random algorithm 0.002459 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 4 / 60
- doing gradient descent pre-polishing
- time taken for gradient descents 32.97 (s)
at the end of descent after 100 steps 14 samples were still beeing optimized.
gradient descents did not improve 35 points
- doing minimization polishing (scipy)
- time taken for polishing 7.465 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.000658 (s)
- smoothing results
- time taken for smoothing 0.000942 (s)
1D profiles: 100%|██████████| 2/2 [01:42<00:00, 51.36s/it]
* initializing 2D profiles
2D profiles: 0%| | 0/1 [00:00<?, ?it/s]
* calculating the 2D profile for: param3, param5 with indexes 2 4
- doing initial randomized search
- time taken for random algorithm 0.4695 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 440 / 584
- doing gradient descent pre-polishing
- time taken for gradient descents 32.34 (s)
at the end of descent after 100 steps 50 samples were still beeing optimized.
gradient descents did not improve 175 points
- doing minimization polishing (scipy)
- time taken for polishing 54.23 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.006907 (s)
- smoothing results
- time taken for smoothing 0.001984 (s)
2D profiles: 100%|██████████| 1/1 [01:27<00:00, 87.05s/it]
As you can notice the profile calculation is complicated and intensive. For this reason caching is implemented and thoroughly used.
After calculating profiles the result can be saved to file for effective caching:
# this command will save the profile to a pickle file:
#flow_profile.savePickle('flow_profile.pkl')
# note that the flow cannot be pickled easily and has its own save and load functions. This means you have to save it separately.
#flow_profile = flow_profiler.posterior_profile_plotter.loadPickle('flow_profile.pkl', flow)
The profiler class hijacks getdist MCSamples so that it can be directly used for getdist plotting as follows:
g = plots.get_subplot_plotter()
g.triangle_plot([flow_profile, flow.MCSamples(10000)],
params=profile_params,
markers=[flow_profile.flow_MAP[flow_profile.index[_p]] for _p in profile_params],
filled=False,
shaded=True,
diag1d_kwargs={'normalized':True},
legend_labels=['Profile','Marginal'])
As we can see the projection effect in $p_5$ is rightfully recovered and the mode that is smaller in the marginal is much higher in the profile.
Note that there might be some little discrepancy between the peak of the profiles and the full-D peak, while there should be no difference. This is due to the finite resolution of the 1D and 2D profiles, which are typically only computed on a small-ish grid.
The profiler class is fully interfaced with getdist plotting facilities. Everything that is not previously cached is recomputed on the flight, as we can see in the following example:
plot_params = ['param1'] + profile_params
g = plots.get_subplot_plotter()
g.plots_1d([flow_profile, flow.MCSamples(10000)],
params=plot_params,
legend_labels=['Profile','Marginal'],
nx=3, normalized=True,
markers=[flow_profile.flow_MAP[flow_profile.index[_p]] for _p in plot_params])
* calculating the 1D profile for: param1 with index 0
* generating samples for profiler
- number of random search samples = 20000
- sampling the distribution
- time taken to sample the distribution 0.13 (s)
- doing initial randomized search
- time taken for random algorithm 0.002551 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 5 / 59
- doing gradient descent pre-polishing
- time taken for gradient descents 31.91 (s)
at the end of descent after 100 steps 7 samples were still beeing optimized.
gradient descents did not improve 40 points
- doing minimization polishing (scipy)
- time taken for polishing 7.7 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.0004752 (s)
- smoothing results
- time taken for smoothing 0.001497 (s)
(3, 1)
Profile accuracy tests:¶
The profiler calculation is hard. As you might have seen it requires hundreds or thousands of minimization instances.
In this section we investigate the reliability of the profile calculation. We can do so since - in this example - we have the exact distribution available.
We implemented methods to wrap a tensorflow or scipy distribution into a flow so that it can be used for thorough tests.
# In this case we can compare with the profile obtained from the exact distribution:
from tensiometer.synthetic_probability import analytic_flow
exact_flow = analytic_flow.analytic_flow(analytic_flow.tf_prob_wrapper(distribution),
param_names=flow.param_names,
param_labels=flow.param_labels,
lims=flow.parameter_ranges)
exact_profile = flow_profiler.posterior_profile_plotter(exact_flow,
initialize_cache=False,
feedback=2 )
exact_profile.update_cache(params=profile_params,
**profiler_options)
Flow profiler
* use_scipy = True
* pre_polish = True
* polish = True
* smoothing = True
* box_prior = False
* parameters = ['param1', 'param2', 'param3', 'param4', 'param5', 'param6']
* feedback = 2
* initializing profiler data.
* generating samples for profiler
- number of random search samples = 20000
- sampling the distribution
- time taken to sample the distribution 0.1138 (s)
* finding MAP
* finding global best fit
- doing initial randomized search
- time taken for random initial selection 0.000356 (s)
- doing minimization (scipy)
- time taken for minimization 110.8 (s)
* initializing 1D profiles
1D profiles: 0%| | 0/2 [00:00<?, ?it/s]
* calculating the 1D profile for: param3 with index 2
- doing initial randomized search
- time taken for random algorithm 0.002177 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 0 / 64
- doing gradient descent pre-polishing
- time taken for gradient descents 21.62 (s)
at the end of descent after 1 steps 0 samples were still beeing optimized.
gradient descents did not improve 64 points
- doing minimization polishing (scipy)
Warning minimization failed for 52 points
but 12 samples were still better.
- time taken for polishing 71.43 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.0003772 (s)
- smoothing results
- time taken for smoothing 0.0005367 (s)
1D profiles: 50%|█████ | 1/2 [01:33<01:33, 93.06s/it]
* calculating the 1D profile for: param5 with index 4
- doing initial randomized search
- time taken for random algorithm 0.002829 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 5 / 59
- doing gradient descent pre-polishing
- time taken for gradient descents 18.85 (s)
at the end of descent after 1 steps 0 samples were still beeing optimized.
gradient descents did not improve 59 points
- doing minimization polishing (scipy)
Warning minimization failed for 43 points
but 16 samples were still better.
- time taken for polishing 78.73 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.0004132 (s)
- smoothing results
- time taken for smoothing 0.0006373 (s)
1D profiles: 100%|██████████| 2/2 [03:10<00:00, 95.32s/it]
* initializing 2D profiles
2D profiles: 0%| | 0/1 [00:00<?, ?it/s]
* calculating the 2D profile for: param3, param5 with indexes 2 4
- doing initial randomized search
- time taken for random algorithm 0.005429 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 407 / 617
- doing gradient descent pre-polishing
- time taken for gradient descents 406.7 (s)
at the end of descent after 2 steps 0 samples were still beeing optimized.
gradient descents did not improve 615 points
- doing minimization polishing (scipy)
Warning minimization failed for 492 points
but 125 samples were still better.
- time taken for polishing 731.9 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.006792 (s)
- smoothing results
- time taken for smoothing 0.001295 (s)
2D profiles: 100%|██████████| 1/1 [18:58<00:00, 1138.60s/it]
We now plot the profiles.
We log plot the 1D distributions to appreciate flow accuracy in the tails. Note that the flow is optimized on log probabilities so it is expected to do a fairly good job across orders of magnitudes in probability. On the other hand small errors in log space are big errors in probability, hence the requirement of high overall accuracy.
g = plots.get_subplot_plotter()
g.triangle_plot([flow_profile, exact_profile],
params=profile_params,
filled=False,
shaded=True,
diag1d_kwargs={'normalized':True},
markers=[exact_profile.flow_MAP[flow_profile.index[_p]] for _p in profile_params],
legend_labels=['Flow Profile','Exact Profile'])
# log axis on the diagonal:
for _i in range(len(profile_params)):
_ax = g.subplots[_i, _i]
_ax.set_yscale('log')
_ax.set_ylim(1.e-5, 1.e1)
_ax.set_ylabel('$\\log_{10}(P)$')
_ax.tick_params(axis='y', which='both', labelright=True)
_ax.yaxis.set_label_position('right')
If we trained average flows we can also estimate the error on the profile as the variance of the logPs across the flows. This is what we are going to do here:
g = plots.get_subplot_plotter()
g.triangle_plot([flow_profile, exact_profile], params=profile_params,
filled=False, shaded=True, diag1d_kwargs={'normalized':True},
markers=[flow_profile.flow_MAP[flow_profile.index[_p]] for _p in profile_params],
legend_labels=['Flow Profile','Exact Profile'])
# add error bar on the diagonal:
for _i in range(len(profile_params)):
# call the method that computes the variance of the profile for an average flow:
_x, _prob, _temp_std = flow_profile.get_1d_profile_variance(profile_params[_i])
# do the plotting:
_ax = g.subplots[_i, _i]
_ax.plot(_x, _prob, color='k', linestyle='-', label='True')
_ax.fill_between(_x, _prob - _temp_std, _prob + _temp_std, color='k', alpha=0.2)
_ax.set_ylabel('$P$')
_ax.tick_params(axis='y', which='both', labelright=True)
_ax.yaxis.set_label_position('right')
WARNING:tensorflow:5 out of the last 1120 calls to <function FlowCallback.log_probability at 0x1d0ba48b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 1120 calls to <function FlowCallback.log_probability at 0x1d0ba48b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:6 out of the last 1121 calls to <function FlowCallback.log_probability at 0x1d1c40b80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:6 out of the last 1121 calls to <function FlowCallback.log_probability at 0x1d1c40b80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Real world application: cosmological parameter profiles¶
In this section we show a real example of a profile applied to cosmological parameter posteriors.
In this case it is particularly interesting to combine profiling and marginalization. The full parameter space of the model is large - of order 30D - but most of these parameters describe systematic effects. These can be marginalized over, after all we might not really be interested in what happens with them and then we can profile cosmological parameters.
# we start by loading up the posterior:
# load the samples (remove no burn in since the example chains have already been cleaned):
chains_dir = './../../test_chains/'
# the DES Y1 3x2 chain:
data_chain = getdist.mcsamples.loadMCSamples(file_root=chains_dir+'DES', no_cache=True, settings=getdist_settings)
# let's add omegab as a derived parameter:
for _ch in [data_chain]:
_p = _ch.getParams()
_h = _p.H0 / 100.
_ch.addDerived(_p.omegabh2 / _h**2, name='omegab', label='\\Omega_b')
_ch.updateBaseStatistics()
# we define the parameters of the problem:
param_names = ['H0', 'omegam', 'sigma8', 'ns', 'omegab']
# we then train the flows on the base parameters that we want to use:
kwargs = {
'feedback': 1,
'verbose': -1,
'plot_every': 1000,
'pop_size': 1,
'num_flows': 5,
'epochs': 500,
}
# actual flow training (note caching):
data_flow = synprob.average_flow_from_chain(data_chain,
param_names=param_names,
**kwargs)
# plot to make sure training went well:
data_flow.training_plot()
Warning: MPI is incompatible with no cache. Disabling MPI. Training flow 0
0epoch [00:00, ?epoch/s]
Training flow 1
0epoch [00:00, ?epoch/s]
Training flow 2
0epoch [00:00, ?epoch/s]
Training flow 3
0epoch [00:00, ?epoch/s]
Training flow 4
0epoch [00:00, ?epoch/s]
# sanity check triangle plot:
g = plots.get_subplot_plotter()
g.triangle_plot([data_chain, data_flow.MCSamples(20000, settings=getdist_settings)],
params=param_names,
filled=False)
# define the options for the profiler:
profiler_options = {
'num_gd_interactions_1D': 100, # number of gradient descent interactions for the 1D profile
'num_gd_interactions_2D': 100, # number of gradient descent interactions for the 2D profile
'scipy_options': { # options for the polishing minimizer
'ftol': 1.e-06,
'gtol': 0.0,
'maxls': 100,
},
'scipy_use_jac': True, # use the jacobian in the minimizer
'num_points_1D': 64, # number of points for the 1D profile
'num_points_2D': 32, # number of points per dimension for the 2D profile
'smooth_scale_1D': 0.2, # smoothing scale for the 1D profile
'smooth_scale_2D': 0.2, # smoothing scale for the 2D profile
}
# initialize the profiler:
data_flow_profile = flow_profiler.posterior_profile_plotter(data_flow,
initialize_cache=False, # usually we want to avoid initializing the cache for all the parameters
feedback=2, # we want high feedback to see the progress
)
Flow profiler * use_scipy = True * pre_polish = True * polish = True * smoothing = True * box_prior = False * parameters = ['H0', 'omegam', 'sigma8', 'ns', 'omegab'] * feedback = 2
# now we initialize the cache for the parameters we want to profile:
profile_params = ['omegam', 'sigma8', 'ns']
data_flow_profile.update_cache(params=profile_params,
**profiler_options)
* initializing profiler data.
* generating samples for profiler
- number of random search samples = 20000
- sampling the distribution
- time taken to sample the distribution 0.08919 (s)
* finding MAP
* finding global best fit
- doing initial randomized search
- time taken for random initial selection 0.000216 (s)
- doing minimization (scipy)
- time taken for minimization 16.47 (s)
* initializing 1D profiles
1D profiles: 0%| | 0/3 [00:00<?, ?it/s]
* calculating the 1D profile for: omegam with index 1
- doing initial randomized search
- time taken for random algorithm 0.002643 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 9 / 55
- doing gradient descent pre-polishing
- time taken for gradient descents 29.37 (s)
at the end of descent after 100 steps 47 samples were still beeing optimized.
gradient descents did not improve 4 points
- doing minimization polishing (scipy)
- time taken for polishing 27.4 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.0007741 (s)
- smoothing results
- time taken for smoothing 0.009004 (s)
1D profiles: 33%|███▎ | 1/3 [00:56<01:53, 56.80s/it]
* calculating the 1D profile for: sigma8 with index 2
- doing initial randomized search
- time taken for random algorithm 0.002571 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 10 / 54
- doing gradient descent pre-polishing
- time taken for gradient descents 31.78 (s)
at the end of descent after 100 steps 45 samples were still beeing optimized.
gradient descents did not improve 3 points
- doing minimization polishing (scipy)
Warning minimization failed for 1 points
but 53 samples were still better.
- time taken for polishing 7.621 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.0004289 (s)
- smoothing results
- time taken for smoothing 0.009124 (s)
1D profiles: 67%|██████▋ | 2/3 [01:36<00:46, 46.58s/it]
* calculating the 1D profile for: ns with index 3
- doing initial randomized search
- time taken for random algorithm 0.002623 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 0 / 64
- doing gradient descent pre-polishing
- time taken for gradient descents 28.92 (s)
at the end of descent after 100 steps 59 samples were still beeing optimized.
- doing minimization polishing (scipy)
- time taken for polishing 4.866 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.0004807 (s)
- smoothing results
- time taken for smoothing 0.009169 (s)
1D profiles: 100%|██████████| 3/3 [02:10<00:00, 43.34s/it]
* initializing 2D profiles
2D profiles: 0%| | 0/3 [00:00<?, ?it/s]
* calculating the 2D profile for: omegam, sigma8 with indexes 1 2
- doing initial randomized search
- time taken for random algorithm 0.005982 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 872 / 152
- doing gradient descent pre-polishing
- time taken for gradient descents 35.95 (s)
at the end of descent after 100 steps 129 samples were still beeing optimized.
gradient descents did not improve 3 points
- doing minimization polishing (scipy)
Warning minimization failed for 1 points
but 151 samples were still better.
- time taken for polishing 18.85 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.01368 (s)
- smoothing results
- time taken for smoothing 0.02177 (s)
2D profiles: 33%|███▎ | 1/3 [00:55<01:50, 55.36s/it]
* calculating the 2D profile for: omegam, ns with indexes 1 3
- doing initial randomized search
- time taken for random algorithm 0.006957 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 463 / 561
- doing gradient descent pre-polishing
- time taken for gradient descents 37.28 (s)
at the end of descent after 100 steps 513 samples were still beeing optimized.
gradient descents did not improve 1 points
- doing minimization polishing (scipy)
Warning minimization failed for 1 points
but 560 samples were still better.
- time taken for polishing 59.98 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.04541 (s)
- smoothing results
- time taken for smoothing 0.02183 (s)
2D profiles: 67%|██████▋ | 2/3 [02:32<01:20, 80.07s/it]
* calculating the 2D profile for: sigma8, ns with indexes 2 3
- doing initial randomized search
- time taken for random algorithm 0.005732 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 402 / 622
- doing gradient descent pre-polishing
- time taken for gradient descents 33.13 (s)
at the end of descent after 100 steps 214 samples were still beeing optimized.
gradient descents did not improve 2 points
- doing minimization polishing (scipy)
Warning minimization failed for 4 points
but 618 samples were still better.
- time taken for polishing 40.87 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.05512 (s)
- smoothing results
- time taken for smoothing 0.02736 (s)
2D profiles: 100%|██████████| 3/3 [03:46<00:00, 75.62s/it]
We can now plot the profiles.
Note that - in this case - since we have no evidence estimate available it is crucial to train a bunch of flows to get an estimate of the variance of the profile. If the variance is large it is usually a good idea to retrain...
g = plots.get_subplot_plotter()
g.triangle_plot([data_flow_profile, data_flow.MCSamples(10000)], params=profile_params,
filled=False, shaded=True, diag1d_kwargs={'normalized':True},
markers=[data_flow_profile.flow_MAP[data_flow_profile.index[_p]] for _p in profile_params],
legend_labels=['Profile','Marginal'])
# add error bar on the diagonal:
for _i in range(len(profile_params)):
_x, _prob, _temp_std = data_flow_profile.get_1d_profile_variance(profile_params[_i])
# do the plotting:
_ax = g.subplots[_i, _i]
_ax.plot(_x, _prob, color='k', linestyle='-', label='True')
_ax.fill_between(_x, _prob - _temp_std, _prob + _temp_std, color='k', alpha=0.2)
_ax.set_ylabel('$P$')
_ax.tick_params(axis='y', which='both', labelright=True)
_ax.yaxis.set_label_position('right')
2D profiles are pretty expensive, 1D profiles, on the other hand, are fairly fast, to the point that we can compute them all for this example:
g = plots.get_subplot_plotter()
data_flow_profile.normalize()
g.plots_1d([data_flow_profile, data_flow.MCSamples(10000)],
params=param_names,
legend_labels=['Profile','Marginal'],
markers=[data_flow_profile.flow_MAP[data_flow_profile.index[_p]] for _p in param_names],
nx=5, share_y=True, normalize=False)
# add error bars:
for _i in range(len(param_names)):
_x, _prob, _temp_std = data_flow_profile.get_1d_profile_variance(param_names[_i], normalize_by='max')
# do the plotting:
_ax = g.subplots.flatten()[_i]
_ax.plot(_x, _prob, color='k', linestyle='-', label='True')
_ax.fill_between(_x, _prob - _temp_std, _prob + _temp_std, color='k', alpha=0.2)
_ax.set_ylabel('$P$')
_ax.tick_params(axis='y', which='both', labelright=True)
_ax.yaxis.set_label_position('right')
* calculating the 1D profile for: H0 with index 0
* generating samples for profiler
- number of random search samples = 20000
- sampling the distribution
- time taken to sample the distribution 0.1709 (s)
- doing initial randomized search
- time taken for random algorithm 0.006587 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 4 / 60
- doing gradient descent pre-polishing
- time taken for gradient descents 33.78 (s)
at the end of descent after 100 steps 60 samples were still beeing optimized.
- doing minimization polishing (scipy)
- time taken for polishing 8.388 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.002446 (s)
- smoothing results
- time taken for smoothing 0.009229 (s)
* calculating the 1D profile for: omegab with index 4
- doing initial randomized search
- time taken for random algorithm 0.004736 (s)
- number of 1D bins 64
- number of empty/filled 1D bins 4 / 60
- doing gradient descent pre-polishing
- time taken for gradient descents 0.7429 (s)
at the end of descent after 100 steps 56 samples were still beeing optimized.
gradient descents did not improve 3 points
- doing minimization polishing (scipy)
- time taken for polishing 7.886 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.000802 (s)
- smoothing results
- time taken for smoothing 0.009893 (s)
If we compute all 1D profiles we can get best-fit and error bars from the profile likelihood (posterior really) ratios. These are defined from posterior thresholds from the maximum.
To do so we have hijacked getdist method getLikeStats.
likestats = data_flow_profile.getLikeStats()
print(likestats)
Best fit sample -log(Like) = -8.470257 Ln(mean 1/like) = -2.058267 mean(-Ln(like)) = 5.698391 -Ln(mean like) = 10.410295 parameter bestfit lower1 upper1 lower2 upper2 H0 9.8853745E+01 7.6280084E+01 1.0000000E+02 4.6630189E+01 1.0000000E+02 H_0 omegam 2.3442246E-01 1.9742227E-01 2.7733716E-01 1.6668578E-01 3.4495744E-01 \Omega_m sigma8 9.1399807E-01 7.9663249E-01 9.9866977E-01 6.9561386E-01 1.1123157E+00 \sigma_8 ns 1.0552700E+00 8.3750000E-01 1.2000000E+00 8.0000000E-01 1.2000000E+00 n_s omegab 7.0894562E-02 5.0739615E-02 8.1745317E-02 1.4963804E-02 9.6055642E-02 \Omega_b
These can be compared to the original margestats. Note that these are obtained starting from the original chain that was fed to the flow:
margestats = data_flow.MCSamples(10000).getMargeStats()
print(margestats)
Marginalized limits: 0.68; 0.95; 0.99 parameter mean sddev lower1 upper1 limit1 lower2 upper2 limit2 lower3 upper3 limit3 H0 8.4087493E+01 1.2223177E+01 7.9696869E+01 1.0000000E+02 < 6.0165112E+01 1.0000000E+02 < 5.1230732E+01 1.0000000E+02 < H_0 omegam 2.5028135E-01 3.4633413E-02 2.1352620E-01 2.7304923E-01 two 1.8791904E-01 3.2067236E-01 two 1.7795739E-01 3.8192359E-01 two \Omega_m sigma8 8.7766194E-01 7.6083841E-02 8.0456436E-01 9.5080715E-01 two 7.2309828E-01 1.0246025E+00 two 6.6149426E-01 1.0816842E+00 two \sigma_8 ns 9.9946500E-01 9.8250692E-02 8.8890904E-01 1.1078923E+00 two 8.0000000E-01 1.2000000E+00 none 8.0000000E-01 1.2000000E+00 none n_s omegab 5.8993048E-02 1.6321537E-02 4.7073399E-02 7.6888777E-02 two 2.3085702E-02 8.7427467E-02 two 1.3581850E-02 1.0385130E-01 two \Omega_b
Let's use them to visualize constraints:
g = plots.get_subplot_plotter()
data_flow_profile.normalize()
g.plots_1d([data_flow_profile, data_flow.MCSamples(10000)],
params=param_names, legend_labels=['Profile','Marginal'],
markers=[data_flow_profile.flow_MAP[data_flow_profile.index[_p]] for _p in param_names],
nx=5, share_y=True, normalize=False)
# add error bars:
for _i in range(len(param_names)):
_ax = g.subplots.flatten()[_i]
_marge = margestats.parWithName(param_names[_i])
_like = likestats.parWithName(param_names[_i])
_ax.axvspan(_marge.limits[0].lower, _marge.limits[0].upper, color='r', alpha=0.2)
_ax.axvspan(_like.ND_limit_bot[0], _like.ND_limit_top[0], color='k', alpha=0.2)
Note that - in this example - confidence intervals obtained from the profile are more conservative than those obained with marginalized statistics.
Having hijacked getdist MCSamples, if we query for MargeStats the profiler we will get confidence intervals calculated with a given mass threshold (i.e. that the isocontour should integrate to a fraction of total).
Let's compare the two:
likestats = data_flow_profile.getLikeStats()
margestats = data_flow_profile.getMargeStats()
g = plots.get_subplot_plotter()
data_flow_profile.normalize()
g.plots_1d([data_flow_profile],
params=param_names, legend_labels=['Profile'],
markers=[data_flow_profile.flow_MAP[data_flow_profile.index[_p]] for _p in param_names],
nx=5, share_y=True, normalize=False)
# add error bars:
for _i in range(len(param_names)):
_ax = g.subplots.flatten()[_i]
_marge = margestats.parWithName(param_names[_i])
_like = likestats.parWithName(param_names[_i])
_ax.axvspan(_marge.limits[0].lower, _marge.limits[0].upper, color='r', alpha=0.2)
_ax.axvspan(_like.ND_limit_bot[0], _like.ND_limit_top[0], color='k', alpha=0.2)
As we can see the profile threshold is still more conservative. Note that this is a sign of a non-Gaussian distribution and specific to this case.
Best constrained parameters profiles¶
As discussed in XXX projection effects arise because of either true non-Gaussianities of the likelihood or because of unconstrained parameter directions. This means that if we looked at the best constrained directions (that maximize prior to posterior gain, as discussed in arXiv:2112.05737) we have a chance of minimizing projection effects.
If you are interested in how to compute best constrained parameter combinations check out the corresponding notebook.
# let's add S8 as a derived parameter:
for _ch in [data_chain]:
_p = _ch.getParams()
_S8 = _p.sigma8 * _p.omegam**0.75
_ch.addDerived(_S8, name='S8', label='S_8\\equiv \\sigma_8 \\Omega_m^{0.75}')
_ch.updateBaseStatistics()
# note the slightly different definition of the S8 parameter - that is taken from the notebook
# we define the parameters of the problem:
param_names = ['H0', 'omegam', 'S8', 'ns', 'omegab']
# we then train the flows on the base parameters that we want to use:
kwargs = {
'feedback': 0,
'verbose': -1,
'plot_every': 1000,
'pop_size': 1,
'num_flows': 5,
'epochs': 500,
}
# actual flow training:
S8_data_flow = synprob.average_flow_from_chain(data_chain, param_names=param_names, **kwargs)
Warning: MPI is incompatible with no cache. Disabling MPI.
0epoch [00:00, ?epoch/s]
0epoch [00:00, ?epoch/s]
0epoch [00:00, ?epoch/s]
0epoch [00:00, ?epoch/s]
0epoch [00:00, ?epoch/s]
# sanity check triangle plot:
g = plots.get_subplot_plotter()
g.triangle_plot([data_chain, S8_data_flow.MCSamples(20000, settings=getdist_settings),
],
params=param_names,
filled=False)
# define the options for the profiler:
profiler_options = {
'num_gd_interactions_1D': 100, # number of gradient descent interactions for the 1D profile
'num_gd_interactions_2D': 100, # number of gradient descent interactions for the 2D profile
'scipy_options': { # options for the polishing minimizer
'ftol': 1.e-06,
'gtol': 0.0,
'maxls': 100,
},
'scipy_use_jac': True, # use the jacobian in the minimizer
'num_points_1D': 64, # number of points for the 1D profile
'num_points_2D': 32, # number of points per dimension for the 2D profile
'smooth_scale_1D': 0.2, # smoothing scale for the 1D profile
'smooth_scale_2D': 0.2, # smoothing scale for the 2D profile
}
# initialize the profiler:
S8_data_flow_profile = flow_profiler.posterior_profile_plotter(S8_data_flow,
initialize_cache=False, # usually we want to avoid initializing the cache for all the parameters
feedback=1, # we want high feedback to see the progress
)
# now we initialize the cache for the parameters we want to profile:
profile_params = ['S8']
S8_data_flow_profile.update_cache(params=profile_params,
**profiler_options)
Flow profiler
* use_scipy = True
* pre_polish = True
* polish = True
* smoothing = True
* box_prior = False
* parameters = ['H0', 'omegam', 'S8', 'ns', 'omegab']
* feedback = 1
* initializing profiler data.
* finding MAP
* initializing 1D profiles
1D profiles: 100%|██████████| 1/1 [01:14<00:00, 74.42s/it]
* initializing 2D profiles
2D profiles: 0it [00:00, ?it/s]
g = plots.get_subplot_plotter()
S8_data_flow_profile.normalize()
g.plots_1d([S8_data_flow_profile, S8_data_flow.MCSamples(10000)],
params=profile_params,
legend_labels=['Profile','Marginal'],
markers=[S8_data_flow_profile.flow_MAP[S8_data_flow_profile.index[_p]] for _p in profile_params],
nx=5, share_y=True, normalize=False)
# add error bars:
for _i in range(len(profile_params)):
_x, _prob, _temp_std = S8_data_flow_profile.get_1d_profile_variance(profile_params[_i], normalize_by='max')
# do the plotting:
_ax = g.subplots.flatten()[_i]
_ax.plot(_x, _prob, color='k', linestyle='-', label='True')
_ax.fill_between(_x, _prob - _temp_std, _prob + _temp_std, color='k', alpha=0.2)
_ax.set_ylabel('$P$')
_ax.tick_params(axis='y', which='both', labelright=True)
_ax.yaxis.set_label_position('right')
Full profile triangle¶
By now you probably want to look at a full profile triangle plot...
Let's do it! It might take some time - but appreciate that this was by far impossible to obtain in reasonable times with other methods...
profile_params = data_flow.param_names
g = plots.get_subplot_plotter()
g.triangle_plot([data_flow_profile, data_flow.MCSamples(10000)], params=profile_params,
filled=False, shaded=True, diag1d_kwargs={'normalized':True},
markers=[data_flow_profile.flow_MAP[data_flow_profile.index[_p]] for _p in profile_params],
legend_labels=['Profile','Marginal'])
# add error bar on the diagonal:
for _i in range(len(profile_params)):
_x, _prob, _temp_std = data_flow_profile.get_1d_profile_variance(profile_params[_i])
# do the plotting:
_ax = g.subplots[_i, _i]
_ax.plot(_x, _prob, color='k', linestyle='-', label='True')
_ax.fill_between(_x, _prob - _temp_std, _prob + _temp_std, color='k', alpha=0.2)
_ax.set_ylabel('$P$')
_ax.tick_params(axis='y', which='both', labelright=True)
_ax.yaxis.set_label_position('right')
* calculating the 2D profile for: H0, omegam with indexes 0 1
- doing initial randomized search
- time taken for random algorithm 0.006056 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 577 / 447
- doing gradient descent pre-polishing
- time taken for gradient descents 38.2 (s)
at the end of descent after 100 steps 434 samples were still beeing optimized.
gradient descents did not improve 3 points
- doing minimization polishing (scipy)
- time taken for polishing 53.84 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.04279 (s)
- smoothing results
- time taken for smoothing 0.02475 (s)
* calculating the 2D profile for: H0, sigma8 with indexes 0 2
- doing initial randomized search
- time taken for random algorithm 0.007397 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 546 / 478
- doing gradient descent pre-polishing
- time taken for gradient descents 35.11 (s)
at the end of descent after 100 steps 432 samples were still beeing optimized.
gradient descents did not improve 3 points
- doing minimization polishing (scipy)
Warning minimization failed for 2 points
but 476 samples were still better.
- time taken for polishing 53.44 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.04357 (s)
- smoothing results
- time taken for smoothing 0.02298 (s)
* calculating the 2D profile for: H0, ns with indexes 0 3
- doing initial randomized search
- time taken for random algorithm 0.007205 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 258 / 766
- doing gradient descent pre-polishing
- time taken for gradient descents 35.94 (s)
at the end of descent after 100 steps 718 samples were still beeing optimized.
- doing minimization polishing (scipy)
Warning minimization failed for 5 points
but 761 samples were still better.
- time taken for polishing 77.53 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.07544 (s)
- smoothing results
- time taken for smoothing 0.02157 (s)
* calculating the 2D profile for: H0, omegab with indexes 0 4
- doing initial randomized search
- time taken for random algorithm 0.007136 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 480 / 544
- doing gradient descent pre-polishing
- time taken for gradient descents 39.65 (s)
at the end of descent after 100 steps 531 samples were still beeing optimized.
gradient descents did not improve 4 points
- doing minimization polishing (scipy)
Warning minimization failed for 1 points
but 543 samples were still better.
- time taken for polishing 70.13 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.085 (s)
- smoothing results
- time taken for smoothing 0.05727 (s)
* calculating the 2D profile for: omegam, omegab with indexes 1 4
- doing initial randomized search
- time taken for random algorithm 0.006835 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 657 / 367
- doing gradient descent pre-polishing
- time taken for gradient descents 36.66 (s)
at the end of descent after 100 steps 53 samples were still beeing optimized.
gradient descents did not improve 14 points
- doing minimization polishing (scipy)
Warning minimization failed for 16 points
but 351 samples were still better.
- time taken for polishing 38.53 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.03959 (s)
- smoothing results
- time taken for smoothing 0.02527 (s)
* calculating the 2D profile for: sigma8, omegab with indexes 2 4
- doing initial randomized search
- time taken for random algorithm 0.00618 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 625 / 399
- doing gradient descent pre-polishing
- time taken for gradient descents 34.73 (s)
at the end of descent after 100 steps 362 samples were still beeing optimized.
gradient descents did not improve 7 points
- doing minimization polishing (scipy)
Warning minimization failed for 5 points
but 394 samples were still better.
- time taken for polishing 45.94 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.04428 (s)
- smoothing results
- time taken for smoothing 0.02564 (s)
* calculating the 2D profile for: ns, omegab with indexes 3 4
- doing initial randomized search
- time taken for random algorithm 0.007644 (s)
- number of 2D bins 1024
- number of empty/filled 2D bins 317 / 707
- doing gradient descent pre-polishing
- time taken for gradient descents 37.58 (s)
at the end of descent after 100 steps 647 samples were still beeing optimized.
- doing minimization polishing (scipy)
Warning minimization failed for 4 points
but 703 samples were still better.
- time taken for polishing 74.29 (s)
- doing interpolation on regular grid
- time taken for interpolation 0.06176 (s)
- smoothing results
- time taken for smoothing 0.02763 (s)